Search Results: "undef"

23 February 2015

Enrico Zini: akonadi-build-hth

The wonders of missing documentation Update: I have managed to build an example Akonadi client application. I'm new here, I want to make a simple C++ GUI app that pops up a QCalendarWidget which my local Akonadi has appointments. I open qtcreator, create a new app, hack away for a while, then of course I get undefined references for all Akonadi symbols, since I didn't tell the build system that I'm building with akonadi. Ok. How do I tell the build system that I'm building with akonadi? After 20 minutes of frantic looking around the internet, I still have no idea. There is a package called libakonadi-dev which does not seem to have anything to do with this. That page mentions everything about making applications with Akonadi except how to build them. There is a package called kdepimlibs5-dev which looks promising: it has no .a files but it does have haders and cmake files. However, qtcreator is only integrated with qmake, and I would really like the handholding of an IDE at this stage. I put something together naively doing just what looked right, and I managed to get an application that segfaults before main() is even called:

/*
 * Copyright   2015 Enrico Zini <enrico@enricozini.org>
 *
 * This work is free. You can redistribute it and/or modify it under the
 * terms of the Do What The Fuck You Want To Public License, Version 2,
 * as published by Sam Hocevar. See the COPYING file for more details.
 */
#include <QDebug>
int main(int argc, char *argv[])
 
    qDebug() << "BEGIN";
    return 0;

QT       += core gui widgets
CONFIG += c++11
TARGET = wtf
TEMPLATE = app
LIBS += -lkdecore -lakonadi-kde
SOURCES += wtf.cpp

I didn't achieve what I wanted, but I feel like I achieved something magical and beautiful after all. I shall now perform some haruspicy on those oscure cmake files to see if I can figure something out. But seriously, people?

24 January 2015

Dirk Eddelbuettel: RcppAnnoy 0.0.5

A new version of RcppAnnoy is now on CRAN. RcppAnnoy wraps the small, fast, and lightweight C++ template header library Annoy written by Erik Bernhardsson for use at Spotify. RcppAnnoy uses Rcpp Modules to offer the exact same functionality as the Python module wrapped around Annoy. This version contains a trivial one-character change requested by CRAN to cleanse the Makevars file of possible GNU Make-isms. Oh well. This release also overcomes an undefined behaviour sanitizer bug noticed by CRAN that took somewhat more effort to deal with. As mentioned recently in another blog post, it took some work to create a proper Docker container with the required compiler and subsequent R setup, but we have one now, and the aforementioned blog post has details on how we replicated the CRAN finding of an UBSAN issue. It also took Erik some extra efforts to set something up for his C++/Python side, but eventually an EC2 instance with Ubuntu 14.10 did the task as my Docker sales skills are seemingly not convincing enough. In any event, he very quickly added the right fix, and I synced RcppAnnoy with his Annoy code. Courtesy of CRANberries, there is also a diffstat report for this release. More detailed information is on the RcppAnnoy page page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

18 January 2015

Dirk Eddelbuettel: Running UBSAN tests via clang with Rocker

Every now and then we get reports from CRAN about our packages failing a test there. A challenging one concerns UBSAN, or Undefined Behaviour Sanitizer. For background on UBSAN, see this RedHat blog post for gcc and this one from LLVM about clang. I had written briefly about this before in a blog post introducing the sanitizers package for tests, as well as the corresponding package page for sanitizers, which clearly predates our follow-up Rocker.org repo / project described in this initial announcement and when we became the official R container for Docker. Rocker had support for SAN testing, but UBSAN was not working yet. So following a recent CRAN report against our RcppAnnoy package, I was unable to replicate the error and asked for help on r-devel in this thread. Martyn Plummer and Jan van der Laan kindly sent their configurations in the same thread and off-list; Jeff Horner did so too following an initial tweet offering help. None of these worked for me, but further trials eventually lead me to the (already mentioned above) RedHat blog post with its mention of -fno-sanitize-recover to actually have an error abort a test. Which, coupled with the settings used by Martyn, were what worked for me: clang-3.5 -fsanitize=undefined -fno-sanitize=float-divide-by-zero,vptr,function -fno-sanitize-recover. This is now part of the updated Dockerfile of the R-devel-SAN-Clang repo behind the r-devel-ubsan-clang. It contains these settings, as well a new support script check.r for littler---which enables testing right out the box. Here is a complete example:

docker                              # run Docker (any recent version, I use 1.2.0)
  run                               # launch a container 
    --rm                            # remove Docker temporary objects when dome
    -ti                             # use a terminal and interactive mode 
    -v $(pwd):/mnt                  # mount the current directory as /mnt in the container
    rocker/r-devel-ubsan-clang      # using the rocker/r-devel-ubsan-clang container
  check.r                           # launch the check.r command from littler (in the container)
    --setwd /mnt                    # with a setwd() to the /mnt directory
    --install-deps                  # installing all package dependencies before the test
    RcppAnnoy_0.0.5.tar.gz          # and test this tarball

I know. It is a mouthful. But it really is merely the standard practice of running Docker to launch a single command. And while I frequently make this the /bin/bash command (hence the -ti options I always use) to work and explore interactively, here we do one better thanks to the (pretty useful so far) check.r script I wrote over the last two days. check.r does about the same as R CMD check. If you look inside check you will see a call to a (non-exported) function from the (R base-internal) tools package. We call the same function here. But to make things more interesting we also first install the package we test to really ensure we have all build-dependencies from CRAN met. (And we plan to extend check.r to support additional apt-get calls in case other libraries etc are needed.) We use the dependencies=TRUE option to have R smartly install Suggests: as well, but only one level (see help(install.packages) for details. With that prerequisite out of the way, the test can proceed as if we had done R CMD check (and additional R CMD INSTALL as well). The result for this (known-bad) package:

edd@max:~/git$ docker run --rm -ti -v $(pwd):/mnt rocker/r-devel-ubsan-clang check.r --setwd /mnt --install-deps RcppAnnoy_0.0.5.tar.gz 
also installing the dependencies  Rcpp ,  BH ,  RUnit 
trying URL 'http://cran.rstudio.com/src/contrib/Rcpp_0.11.3.tar.gz'
Content type 'application/x-gzip' length 2169583 bytes (2.1 MB)
opened URL
==================================================
downloaded 2.1 MB
trying URL 'http://cran.rstudio.com/src/contrib/BH_1.55.0-3.tar.gz'
Content type 'application/x-gzip' length 7860141 bytes (7.5 MB)
opened URL
==================================================
downloaded 7.5 MB
trying URL 'http://cran.rstudio.com/src/contrib/RUnit_0.4.28.tar.gz'
Content type 'application/x-gzip' length 322486 bytes (314 KB)
opened URL
==================================================
downloaded 314 KB
trying URL 'http://cran.rstudio.com/src/contrib/RcppAnnoy_0.0.4.tar.gz'
Content type 'application/x-gzip' length 25777 bytes (25 KB)
opened URL
==================================================
downloaded 25 KB
* installing *source* package  Rcpp  ...
** package  Rcpp  successfully unpacked and MD5 sums checked
** libs
clang++-3.5 -fsanitize=undefined -fno-sanitize=float-divide-by-zero,vptr,function -fno-sanitize-recover -I/usr/local/lib/R/include -DNDEBUG -I../inst/include/ -I/usr/local/include    -fpic  -pipe -Wall -pedantic -
g  -c Date.cpp -o Date.o
[...]
* checking examples ... OK
* checking for unstated dependencies in  tests  ... OK
* checking tests ...
  Running  runUnitTests.R 
 ERROR
Running the tests in  tests/runUnitTests.R  failed.
Last 13 lines of output:
  +     if (getErrors(tests)$nFail > 0)  
  +         stop("TEST FAILED!")
  +      
  +     if (getErrors(tests)$nErr > 0)  
  +         stop("TEST HAD ERRORS!")
  +      
  +     if (getErrors(tests)$nTestFunc < 1)  
  +         stop("NO TEST FUNCTIONS RUN!")
  +      
  +  
  
  
  Executing test function test01getNNsByVector  ... ../inst/include/annoylib.h:532:40: runtime error: index 3 out of bounds for type 'int const[2]'
* checking PDF version of manual ... OK
* DONE
Status: 1 ERROR, 2 WARNINGs, 1 NOTE
See
   /tmp/RcppAnnoy/..Rcheck/00check.log 
for details.
root@a7687c014e55:/tmp/RcppAnnoy#

The log shows that thanks to check.r, we first download and the install the required packages Rcpp, BH, RUnit and RcppAnnoy itself (in the CRAN release). Rcpp is installed first, we then cut out the middle until we get to ... the failure we set out to confirm. Now having a tool to confirm the error, we can work on improved code. One such fix currently under inspection in a non-release version 0.0.5.1 then passes with the exact same invocation (but pointing at RcppAnnoy_0.0.5.1.tar.gz):

edd@max:~/git$ docker run --rm -ti -v $(pwd):/mnt rocker/r-devel-ubsan-clang check.r --setwd /mnt --install-deps RcppAnnoy_0.0.5.1.tar.gz
also installing the dependencies  Rcpp ,  BH ,  RUnit 
[...]
* checking examples ... OK
* checking for unstated dependencies in  tests  ... OK
* checking tests ...
  Running  runUnitTests.R 
 OK
* checking PDF version of manual ... OK
* DONE
Status: 1 WARNING
See
   /mnt/RcppAnnoy.Rcheck/00check.log 
for details.
edd@max:~/git$

This proceeds the same way from the same pristine, clean container for testing. It first installs the four required packages, and the proceeds to test the new and improved tarball. Which passes the test which failed above with no issues. Good. So we now have an "appliance" container anybody can download from free from the Docker hub, and deploy as we did here in order to have fully automated, one-command setup for testing for UBSAN errors. UBSAN is a very powerful tool. We are only beginning to deploy it. There are many more useful configuration settings. I would love to hear from anyone who would like to work on building this out via the R-devel-SAN-Clang GitHub repo. Improvements to the littler scripts are similarly welcome (and I plan on releasing an updated littler package "soon").

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

31 December 2014

Wouter Verhelst: Perl 'issues'

I just watched a CCC talk in which the speaker claims Perl is horribly broken. Watching it was fairly annoying however, since I had to restrain myself from throwing things at the screen. If you're going to complain about the language, better make sure you actually understand the language first. I won't deny that there are a few weird constructions in there, but hey. The talk boils down to a claim that perl is horrible, because the list "data type" is "broken". First of all, Netanel, in Perl, lists are not arrays. Yes, that's confusing if you haven't done more than a few hours of Perl, but hear me out. In Perl, a list is an enumeration of values. A variable with an '@' sigil is an array; a construct consisting of an opening bracket ('(') followed by a number of comma- or arrow-separated values (',' or '=>'), followed by a closing bracket, is a list. Whenever you assign more than one value to an array or a hash, you need to use a list to enumerate the values. Subroutines in perl also use lists as arguments or return values. Yes, that last bit may have been a mistake. Perl has a concept of "scalar context" and "list context". A scalar context is what a sub is in when you assign the return value of your sub to a scalar; a list context is when you assign the return value of your sub to an array or a hash, or when you use the list construct (the thing with brackets and commas) with sub calls (instead of hardcoded values or variables) as the individual values. This works as follows:

sub magic  
    if (wantarray())  
        print "You're asking for a list!";
        return ('a', 'b', 'c');
      else  
        print "You're asking for a scalar!";
        return 'a';
     
 
print ("list: ", magic(), "\n");
print "scalar: " . magic() . "\n";

The above example will produce the following output:

You're asking for a list!
list: abc
You're asking for a scalar!
scalar: a

What happens here? The first print line creates a list (because things are separated by commas); the second one does not (the '.' is perl's string concatenation operator; as you can only concatenate scalars, the result is that you call the magic() sub in scalar context). Yes, seen as how arrays are not lists, the name of the wantarray() sub is horribly chosen. Anyway. It is documented that lists cannot be nested. Lists can only be one-dimensional constructs. If you create a list, and add another list as an element (or something that can be converted to a list, like an array or a hash), then the result is that you get a flattened list. If you don't want a flattened list, you need to use a reference instead. A reference is a scalar value that, very much like a pointer in C, contains a reference to another variable. This other variable can be an array, a hash, or a scalar. But it cannot be a list, because it must be a variable -- and lists cannot be variables. If you need to create multi-dimensional constructs, you need to use references. Taking a reference is done by prepending a backslash to whatever it is you're trying to take a reference of; or, in the case of arrays of hashes, one can create an anonymous array or hash with [] resp . E.g., if you want to add a non-flattened array to a list, you instead create a reference to an array, like so:

$arrayref = [ 'this', 'is', 'an', 'anonymous', 'array'];

you can now create a multi-dimensional construct:

@multiarray = ('elem1', $arrayref);

Or you can do that in one go:

@multiarray = ('elem1', [ 'this', 'is', 'an', 'anonymous', 'array']);

Alternatively, you can create a non-anonymous array first:

@onedimarray = ('this', 'is', 'not', 'an', 'anonymous', 'array');
@multiarray = ('elem1', \@onedimarray);

In perl, curly brackets can be used to create a reference to anonymous hashes, whereas square brackets can be used to create a reference to anonymous arrays. This is all a basic part of the language; if you don't understand that, you simply don't understand Perl. In other words, whenever you see someone doing this:

%hash =  'a' => 'b' ;

@array = [ '1', '2' ];

you can say that they don't understand the language. For reference, the assignment to %hash will result in an (unusable) hash with a single key that is a reference to an anonymous hash (which cannot be accessed anymore) and a value of undef; the assignment to @array will result in a two-dimensional array with one element in the first dimension, and two elements in the second. The CGI.pm fix which Natanel dismisses in the Q&A part of the talk as a "warning" which won't help (because it would be too late) is actually a proper fix, which should warn people in all cases. That is, if you do this:

%hash =   'name' => $name, 'password' => $cgi->param('password')  ;

then CGI.pm's param() sub will notice that it's being called in list context, and issue a warning -- regardless of whether the user is passing one or two password query-parameters. It uses the wantarray() sub, and produces a warning if that returns true. In short, Perl is not the horribly broken construct that Natanel claims it to be. Yes, there are a few surprises (most of which exist for historical reasons), and yes, those should be fixed. This is why the Perl community has redone much of perl for Perl 6. But the fact that there are a few surprises doesn't mean the whole language is broken. There are surprises in most languages; that is a fact of life. Yes, the difference between arrays and hashes on the one hand, and lists on the other hand, is fairly confusing; it took me a while to understand this. But once you get the hang of it, it's not all that difficult. And then these two issues that Natanel found (which I suppose could be described as bugs in the core modules) aren't all that surprising anymore. So, in short:

Don't stop using Perl. However, do make sure that whenever you use a language, you understand the language, first, so you don't get bitten by its historical baggage. This is true for any language, not just Perl.
Don't assume that just because you found issues with core modules, the whole language is suddenly broken.

What I do agree with is that if you want to use a language, you should understand its features. Unfortunately, this single line in the final slide of Natanel's talk is just about the only thing in the whole talk that sortof made sense to me. Ah well.

26 October 2014

Gregor Herrmann: RC bugs 2014/38-43

it's this time of the year^Wrelease cycle again almost. in ten days (& roughly two hours), the freeze for the next debian release, codenamed jessie, will start. by this time packages must be in testing in order to be candidates for the release, as explained in the release team's detailed freeze policy. this also means, with the regular testing migration time set to ten days, that tonight's dinstall run closed the regular upload window. & this also means that we should all concentrate on fixing RC bugs to make the freeze as short as possible & jessie yet another great release. before I head over to the UDD bugs page, I'd like to summarize my work on RC bugs in the last weeks, which was again focussed on packages in the Debian Perl Group.

~~#736739~~ src:lemonldap-ng: "[src:lemonldap-ng] Sourceless file"
upload new upstream release prepared by Xavier Guimard (pkg-perl)
~~#736807~~ src:lemonldap-ng: "[src:lemonldap-ng] Non free file"
upload new upstream release prepared by Xavier Guimard (pkg-perl)
~~#742409~~ libsereal-encoder-perl: "libsereal-encoder-perl: FTBFS on some architectures"
upload new upstream release, with patch from ntyni (pkg-perl)
#755317 src:libnet-bonjour-perl: "libnet-bonjour-perl: FTBFS: Tests failures"
lower severity (pkg-perl)
~~#755328~~ src:libgraph-writer-graphviz-perl: "libgraph-writer-graphviz-perl: FTBFS: Tests failures"
update patches for test suite (pkg-perl)
~~#759966~~ src:libvideo-fourcc-info-perl: "libvideo-fourcc-info-perl: FTBFS: dh_auto_test: perl Build test returned exit code 255"
close bug, fixed in ~~#762334~~ (pkg-perl)
~~#762333~~ libcgi-application-plugin-ajaxupload-perl: "libcgi-application-plugin-ajaxupload-perl: FTBFS with libjson-any-perl 1.36-1: test failures"
close, as the bug is fixed in libpackage-stash-perl, cf. ~~#762334~~ (pkg-perl)
~~#763254~~ src:libcrypt-gcrypt-perl: "libcrypt-gcrypt-perl: FTBFS: GCrypt.xs:59:5: error: unknown type name 'gcry_ac_handle_t'"
add patch from CPAN RT (pkg-perl)
~~#765053~~ libapache-dbilogger-perl: "libapache-dbilogger-perl: FTBFS - undefined symbol: modperl_is_running"
close, as the bug is fixed in libapache2-mod-perl2, cf. ~~#765174~~ (pkg-perl)
~~#765137~~ src:libcgi-fast-perl: "libcgi-fast-perl: FTBFS: Tests failures"
upload new upstream release (pkg-perl)
#765150 src:libhtml-formfu-perl: "libhtml-formfu-perl: FTBFS: Tests failures"
lower severity (pkg-perl)
~~#765165~~ liblog-dispatch-perl: "liblog-dispatch-perl: missing dependency/recommendation on libdevel-globaldestruction-perl"
add missing (build) dependency (pkg-perl)

27 September 2014

Niels Thykier: Lintian Upcoming API making it easier to write correct and safe code

The upcoming version of Lintian will feature a new set of API that attempts to promote safer code. It is hardly a ground-breaking discovery , just a much needed feature. The primary reason for this API is that writing safe and correct code is simply too complicated that people get it wrong (including yours truly on occasion). The second reason is that I feel it is a waste having to repeat myself when reviewing patches for Lintian. Fortunately, the kind of issues this kind of mistake creates are usually minor information leaks, often with no chance of exploiting it remotely without the owner reviewing the output first[0]. Part of the complexity of writing correct code originates from the fact that Lintian must assume Debian packages to be hostile until otherwise proven[1]. Consider a simplified case where we want to read a file (e.g. the copyright file):

package Lintian::cpy_check;
use strict; use warnings; use autodie;
sub run  
  my ($pkg, undef, $info) = @_;
  my $filename = "usr/share/doc/$pkg/copyright";
  # BAD: This is an example of doing it wrong
  open(my $fd, '<', $info->unpacked($filename));
  ...;
  close($fd);
  return;

This has two trivial vulnerabilities[2].

Any part of the path (usr,usr/share, ) can be asymlink to somewhere else like /
1. Problem: Access to potentially any file on the system with the credentials of the user running Lintian. But even then, Lintian generally never write to those files and the user has to (usually manually) disclose the report before any information leak can be completed.
The target path can point to a non-file.
1. Problem: Minor inconvenience by DoS of Lintian. Examples include a named pipe, where Lintian will get stuck until a signal kills it.

Of course, we can do this right[3]:

package Lintian::cpy_check;
use strict; use warnings; use autodie;
use Lintian::Util qw(is_ancestor_of);
sub run  
  my ($pkg, undef, $info) = @_;
  my $filename = "usr/share/doc/$pkg/copyright";
  my $root = $info->unpacked
  my $path = $info->unpacked($filename);
  if ( -f $path and is_ancestor_of($root, $path))  
    open(my $fd, '<', $path);
    ...;
    close($fd);
   
  return;

Where is_ancestor_of is the only available utility to assist you currently. It hides away some 10-12 lines of code to resolve the two paths and correctly asserting that $path is (an ancestor of) $root. Prior to Lintian 2.5.12, you would have to do that ancestor check by hand in each and every check[4]. In the new version, the correct code would look something like this:

package Lintian::cpy_check;
use strict; use warnings; use autodie;
sub run  
  my ($pkg, undef, $info) = @_;
  my $filename = "usr/share/doc/$pkg/copyright";
  my $path = $info->index_resolved_path($filename);
  if ($path and $path->is_open_ok)  
    my $fd = $path->open;
    ...;
    close($fd);
   
  return;

Now, you may wonder how that promotes safer code. At first glance, the checking code is not a lot simpler than the previous correct example. However, the new code has the advantage of being safer even if you forget the checks. The reasons are:

The return value is entirely based on the file index of the package (think: tar vtf data.tar.gz). At no point does it use the file system to resolve the path. Whether your malicious package trigger an undef warning based on the return value of index_resolved_index leaks nothing about the host machine.
1. However, it does take safe symlinks into account and resolves them for you. If you ask for foo/bar and foo is a symlink to baz and baz/bar exists in the package, you will get baz/bar . If baz/bar happens to be a symlink, then it is resolved as well.
2. Bonus: You are much more likely to trigger the undef warning during regular testing, since it also happens if the file is simply missing.
If you attempt to call $path->open without calling $path->is_open_ok first, Lintian can now validate the call for you and stop it on unsafe actions.

It also has the advantage of centralising the code for asserting safe access, so bugs in it only needs to be fixed in one place. Of course, it is still possible to write unsafe code. But at least, the new API is safer by default and (hopefully) more convenient to use. [0] Lintian.debian.org being the primary exception here. [1] This is in contrast to e.g. piuparts, which very much trusts its input packages by handing the package root access (albeit chroot ed, but still). [2] And also a bug. Not all binary packages have a copyright instead some while have a symlink to another package. [3] The code is hand-typed into the blog without prior testing (not even compile testing it). The code may be subject to typos, brown-paper-bag bugs etc. which are all disclaimed (of course). [4] Fun fact, our documented example for doing it correctly prior to implementing is_ancestor_of was in fact not correct. It used the root path in a regex (without quoting the root path) fortunately, it just broke lintian when your TMPDIR / LINTIAN_LAB contained certain regex meta-characters (which is pretty rare).

16 September 2014

Matthew Garrett: ACPI, kernels and contracts with firmware

ACPI is a complicated specification - the latest version is 980 pages long. But that's because it's trying to define something complicated: an entire interface for abstracting away hardware details and making it easier for an unmodified OS to boot diverse platforms.

Inevitably, though, it can't define the full behaviour of an ACPI system. It doesn't explicitly state what should happen if you violate the spec, for instance. Obviously, in a just and fair world, no systems would violate the spec. But in the grim meathook future that we actually inhabit, systems do. We lack the technology to go back in time and retroactively prevent this, and so we're forced to deal with making these systems work.

This ends up being a pain in the neck in the x86 world, but it could be much worse. Way back in 2008 I wrote something about why the Linux kernel reports itself to firmware as "Windows" but refuses to identify itself as Linux. The short version is that "Linux" doesn't actually identify the behaviour of the kernel in a meaningful way. "Linux" doesn't tell you whether the kernel can deal with buffers being passed when the spec says it should be a package. "Linux" doesn't tell you whether the OS knows how to deal with an HPET. "Linux" doesn't tell you whether the OS can reinitialise graphics hardware.

Back then I was writing from the perspective of the firmware changing its behaviour in response to the OS, but it turns out that it's also relevant from the perspective of the OS changing its behaviour in response to the firmware. Windows 8 handles backlights differently to older versions. Firmware that's intended to support Windows 8 may expect this behaviour. If the OS tells the firmware that it's compatible with Windows 8, the OS has to behave compatibly with Windows 8.

In essence, if the firmware asks for Windows 8 support and the OS says yes, the OS is forming a contract with the firmware that it will behave in a specific way. If Windows 8 allows certain spec violations, the OS must permit those violations. If Windows 8 makes certain ACPI calls in a certain order, the OS must make those calls in the same order. Any firmware bug that is triggered by the OS not behaving identically to Windows 8 must be dealt with by modifying the OS to behave like Windows 8.

This sounds horrifying, but it's actually important. The existence of well-defined[1] OS behaviours means that the industry has something to target. Vendors test their hardware against Windows, and because Windows has consistent behaviour within a version[2] the vendors know that their machines won't suddenly stop working after an update. Linux benefits from this because we know that we can make hardware work as long as we're compatible with the Windows behaviour.

That's fine for x86. But remember when I said it could be worse? What if there were a platform that Microsoft weren't targeting? A platform where Linux was the dominant OS? A platform where vendors all test their hardware against Linux and expect it to have a consistent ACPI implementation?

Our even grimmer meathook future welcomes ARM to the ACPI world.

Software development is hard, and firmware development is software development with worse compilers. Firmware is inevitably going to rely on undefined behaviour. It's going to make assumptions about ordering. It's going to mishandle some cases. And it's the operating system's job to handle that. On x86 we know that systems are tested against Windows, and so we simply implement that behaviour. On ARM, we don't have that convenient reference. We are the reference. And that means that systems will end up accidentally depending on Linux-specific behaviour. Which means that if we ever change that behaviour, those systems will break.

So far we've resisted calls for Linux to provide a contract to the firmware in the way that Windows does, simply because there's been no need to - we can just implement the same contract as Windows. How are we going to manage this on ARM? The worst case scenario is that a system is tested against, say, Linux 3.19 and works fine. We make a change in 3.21 that breaks this system, but nobody notices at the time. Another system is tested against 3.21 and works fine. A few months later somebody finally notices that 3.21 broke their system and the change gets reverted, but oh no! Reverting it breaks the other system. What do we do now? The systems aren't telling us which behaviour they expect, so we're left with the prospect of adding machine-specific quirks. This isn't scalable.

Supporting ACPI on ARM means developing a sense of discipline around ACPI development that we simply haven't had so far. If we want to avoid breaking systems we have two options:

1) Commit to never modifying the ACPI behaviour of Linux.
2) Exposing an interface that indicates which well-defined ACPI behaviour a specific kernel implements, and bumping that whenever an incompatible change is made. Backward compatibility paths will be required if firmware only supports an older interface.

(1) is unlikely to be practical, but (2) isn't a great deal easier. Somebody is going to need to take responsibility for tracking ACPI behaviour and incrementing the exported interface whenever it changes, and we need to know who that's going to be before any of these systems start shipping. The alternative is a sea of ARM devices that only run specific kernel versions, which is exactly the scenario that ACPI was supposed to be fixing.

[1] Defined by implementation, not defined by specification
[2] Windows may change behaviour between versions, but always adds a new _OSI string when it does so. It can then modify its behaviour depending on whether the firmware knows about later versions of Windows.

comments

11 September 2014

Sylvestre Ledru: Rebuild of Debian using Clang 3.5.0

Clang 3.5.0 has just been released. A new rebuild has been done highlight the progress to get Debian built with clang. tl;dr: Great progress. We decreased from 9.5% to 5.7% of failures. Full results are available on http://clang.debian.net At time of the rebuild with 3.4.2, we had 2040 packages failing to build with clang. With 3.5.0, this dropped to 1261 packages. Fixes With Arthur Marble and Alexander Ovchinnikov, both GSoC students, we worked on various ways to decrease the number of errors. Upstream fixes First, the most obvious way, we fixed programming bugs/mistakes in upstream sources. Basically, we took categories of failure and fixed issues one after the other. We started with simple bugs like 'Wrong main declaration', 'non-void function should return a value' or 'Void function should not return a value'.

They are trivial to fix. We continued with harder fixes like ' Undefined reference' or 'Variable length array for a non POD (plain old data) element'.

So, besides these one, we worked on:

In total, we reported 295 bugs with patches. 85 of them have been fixed (meaning that the Debian maintainer uploaded a new version with the fix).

In parallel, I think that the switch by FreeBSD and Mac OS X to Clang also helped to fix various issues by upstreams. Hacking in clang As a parallel approach, we started to implement a suggestion from Linus Torvalds and a few others. Instead of trying to fix all upstream, where we can, we tried to update clang to improve the gcc compatibility.

gcc has many flags to disable or enable optimizations. Some of them are legacy, others have no sense in clang, etc. Instead of failing in clang with an error, we create a new category of warnings (showing optimization flag '%0' is not supported) and moved all relevant flags into it. Some examples, r212805, r213365, r214906 or r214907

We also updated clang to silent some useless arguments like -finput_charset=UTF-8 (r212110), clang being UTF-8 compliant.

Finally, we worked on the forwarding of linker flags. Clang and gcc have a very different behavior: when gcc does not know an argument, it is going to forward the argument to the linker. Clang, in this case, is going to reject the argument and fail with an error. In clang, we have to explicitly declare which arguments are going to be transfer to the linker. Of course, the correct way to pass arguments to the linker is to use -Xlinker or -Wl but the Debian rebuild proved that these shortcuts are used. Two of these arguments are now forwarded:

-z keyword - r213198
-u Force symbol to be entered in the output file as an undefined symbol - r211756. This one fixed most of the haskell build failures. It fixed the most common issue that we had (701 occurrences but this does not mean that all these packages build fine now, some haskell-based package are failing later in the process)

New errors Just like in other releases, new warnings are added in clang. With (bad) usage of -Werror by upstream software, this causes new build failures:

Absolute value error - 6 occurences
Tautological pointer comparison - 10 occurences

I also took the opportunity to add some further categorizations in the list of errors. Some examples:

Next steps The Debile project being close to ready with Cl ment Schreiner's GSoC, we will now have an automatic and transparent way to rebuild packages using clang. Conclusion As stated, we can see a huge drop in term of number of failures over time:

Hopefully, Clang getting better and better, more and more projects adopting it as the default compiler or as a base for plugin/extension developments, this percentage will continue to decrease.
Having some kind of release goal with clang for Jessie+1 can now be considered as potentially reachable. Want to help? There are several things which can be done to help:

Point me common error patterns in the Not categorized list of errors to create new categories
Report and fix packages
As an upstream, integrate clang as part of your continuous integration system
Hack on cqa-scanlogs, the error detection tool to detect error patterns (example: Undetected error). This tool is used also for the regular rebuilds of the archive.
Improve clang.debian.net website

Acknowledgments Thanks to David Suarez for the rebuilds of the archive, Arthur Marble and Alexander Ovchinnikov for their GSoC works and Nicolas S velin-Radiguet for the few fixes.

Original post blogged on b2evolution.

21 August 2014

Wouter Verhelst: Multiarchified eID libraries, now public

Yesterday, I spent most of the day finishing up the multiarch work I'd been doing on introducing multiarch to the eID middleware, and did another release of the Linux builds. As such, it's now possible to install 32-bit versions of the eID middleware on a 64-bit Linux distribution. For more details, please see the announcement. Learning how to do multiarch (or biarch, as the case may be) for three different distribution families has been a, well, learning experience. Being a Debian Developer, figuring out the technical details for doing this on Debian and its derivatives wasn't all that hard. You just make sure the libraries are installed to the multiarch-safe directories (i.e., /usr/lib/<gnu arch triplet>), you add some Multi-Arch: foreign or Multi-Arch: same headers where appropriate, and you're done. Of course the devil is in the details (define "where appropriate"), but all in all it's not that difficult and fairly deterministic. The Fedora (and derivatives, like RHEL) approach to biarch is that 64-bit distributions install into /usr/lib64 and 32-bit distributions install into /usr/lib. This goes for any architecture family, not just the x86 family; the same method works on ppc and ppc64. However, since fedora doesn't do powerpc anymore, that part is a detail of little relevance. Once that's done, yum has some heuristics whereby it will prefer native-architecture versions of binaries when asked, and may install both the native-architecture and foreign-architecture version of a particular library package at the same time. Since RPM already has support for installing multiple versions of the same package on the same system (a feature that was originally created, AIUI, to support the installation of multiple kernel versions), that's really all there is to it. It feels a bit fiddly and somewhat fragile, since there isn't really a spec and some parts seem fairly undefined, but all in all it seems to work well enough in practice. The openSUSE approach is vastly different to the other two. Rather than installing the foreign-architecture packages natively, as in the Debian and Fedora approaches, openSUSE wants you to take the native foo.ix86.rpm package and convert that to a foo-32bit.x86_64.rpm package. The conversion process filters out non-unique files (only allows files to remain in the package if they are in library directories, IIUC), and copes with the lack of license files in /usr/share/doc by adding a dependency header on the native package. While the approach works, it feels like unnecessary extra work and bandwidth to me, and obviously also wouldn't scale beyond biarch. It also isn't documented very well; when I went to openSUSE IRC channels and started asking questions, the reply was something along the lines of "hand this configuration file to your OBS instance". When I told them I wasn't actually using OBS and had no plans of migrating to it (because my current setup is complex enough as it is, and replacing it would be far too much work for too little gain), it suddenly got eerily quiet. Eventually I found out that the part of OBS which does the actual build is a separate codebase, and integrating just that part into my existing build system was not that hard to do, even though it doesn't come with a specfile or RPM package and wants to install files into /usr/bin and /usr/lib. With all that and some more weirdness I've found in the past few months that I've been building packages for openSUSE I now have... Ideas(TM) about how openSUSE does things. That's for another time, though. (disclaimer: there's a reason why I'm posting this on my personal blog and not on an official website... don't take this as an official statement of any sort!)

3 August 2014

Dirk Eddelbuettel: Introducing sanitizers 0.1.0

A new package sanitizers is now on CRAN. It provides test cases for Address Sanitizers, and Undefined Behaviour Sanitizers. These are two recent features of both g++ and clang++, and described in the Checking Memory Access section of the Writing R Extension manual. I set up a new web page for the sanitizers package which illustrates their use case via pre-built Docker images, similar to what I presented at the end of my useR! 2014 keynote a few weeks ago. So instead of repeating this over here, I invite you to read the detailed discussion on the sanitizers page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

25 July 2014

Juliana Louback: Extending an xTuple Business Object

xTuple is in my opinion incredibly well designed; the code is clean and the architecture ahderent to a standardized structure. All this makes working with xTuple software quite a breeze. I wanted to integrate JSCommunicator into the web-based xTuple version. JSCommunicator is a SIP communication tool, so my first step was to create an extension for the SIP account data. Luckily for me, the xTuple development team published an awesome tutorial for writing an xTuple extension. xTuple cleverly uses model based business objects for the various features available. This makes customizing xTuple very straightforward. I used the tutorial mentioned above for writing my extension, but soon noticed my goals were a little different. A SIP account has 3 data fields, these being the SIP URI, the account password and an optional display name. xTuple currently has a business object in the core code for a User Account and it would make a lot more sense to simply add my 3 fields to this existing business object rather than create another business object. The tutorial very clearly shows how to extend a business object with another business object, but not how to extend a business object with only new fields (not a whole new object). Now maybe I m just a whole lot slower than most people, but I had a ridiculously had time figuring this out. Mind you, this is because I m slow, because the xTuple documentation and code is understandable and as self-explanatory as it gets. I think it just takes a bit to get used to. Either way, I thought this just might be useful to others so here is how I went about it. Setup First you ll have to set up your xTuple development environment and fork the xtuple and xtuple-extesions repositories as shown in this handy tutorial. A footnote I d like to add is please verify that your version of Vagrant (and anything else you install) is the one listed in the tutorial. I think I spent like two entire days or more on a wild goose (bug) chase trying to set up my environment when the cause of all the errors was that I somehow installed an older version of Vagrant - 1.5.4 instead of 1.6.3. Please don t make the same mistake I did. Actually if for some reason you get the following error when you try using node:

<<ERROR 2014-07-10T23:52:46.948Z>> Unrecoverable exception. Cannot call method 'extend' of undefined
    at /home/vagrant/dev/xtuple/lib/backbone-x/source/model.js:37:39
    at Object.<anonymous> (/home/vagrant/dev/xtuple/lib/backbone-x/source/model.js:1364:3)
    ...

chances are, you have the wrong version. That s what happened to me. The Vagrant Virtual Development Environment automatically installs and configures everything you need, it s ready to go. So if you find yourself installing and updating and apt-gets and etc, you probably did something wrong. Coding So by now we should have the Vagrant Virtual Development Environment set up and the web app up and running and accessible at localhost:8443. So far so good. Disclaimer: You will note that much of this is similar - or rather, nearly identical - to xTuple s tutorial but there are some small but important differences and a few observations I think might be useful. Other Disclaimer: I m describing how I did it, which may or may not be up to snuff . Works for me though. Schema First let s make a schema for the table we will create with the new custom fields. Be sure to create the correct directory stucture, aka /path/to/xtuple-extensions/source/<YOUR EXTENSION NAME>/database/source or in my case /path/to/xtuple-extensions/source/sip_account/database/source, and create the file create_sa_schema.sql, sa is the name of my schema. This file will contain the following lines:

do $$
  /* Only create the schema if it hasn't been created already */
  var res, sql = "select schema_name from information_schema.schemata where schema_name = 'sa'",
  res = plv8.execute(sql);
  if (!res.length)  
    sql = "create schema sa; grant all on schema sa to group xtrole;"
    plv8.execute(sql);
   
$$ language plv8;

Of course, feel free to replace sa with your schema name of choice. All the code described here can be found in my xtuple-extensions fork, on the sip_ext branch. Table We ll create a table containing your custom fields and a link to an existing table - the table for the existing business object you want to extend. If you re wondering why make a whole new table for a few extra fields, here s a good explanation, the case in question is adding fields to the Contact business object. You need to first figure out what table you want to link to. This might not be uber easy. I think the best way to go about it is to look at the ORMs. The xTuple ORMs are a JSON mapping between the SQL tables and the object-oriented world above the database, they re .json files found at path/to/xtuple/node_modules/xtuple/enyo-client/database/orm/models for the core business objects and at path/to/xtuplenyo-client/extensions/source/<EXTENSION NAME>/database/orm/models for exension business objects. I ll give two examples. If you look at contact.json you will see that the Contact business object refers to the table cntct . Look for the type : Contact on the line above, so we know it s the Contact business object. In my case, I wanted to extend the UserAccount and UserAccountRelation business objects, so check out user_account.json. The table listed for UserAccount is xt.usrinfo and the table listed for UserAccountRelation is xt.usrlite. A closer look at the sql files for these tables (usrinfo.sql and usrlite.sql) revealed that usrinfo is in fact a view and usrlite is A light weight table of user information used to avoid punishingly heavy queries on the public usr view . I chose to refer to xt.usrlite - that or I received error messages when trying the other table names. Now I ll make the file /path/to/xtuple-extensions/source/sip_account/database/source/usrlitesip.sql, to create a table with my custom fields plus the link to the urslite table. Don t quote me on this, but I m under the impression that this is the norm for naming the sql file joining tables: the name of the table you are referring to ( usrlite in this case) and your extension s name. Content of usrlitesip.sql:

select xt.create_table('usrlitesip', 'sa');
select xt.add_column('usrlitesip','usrlitesip_id', 'serial', 'primary key', 'sa');
select xt.add_column('usrlitesip','usrlitesip_usr_username', 'text', 'references xt.usrlite (usr_username)', 'sa');
select xt.add_column('usrlitesip','usrlitesip_uri', 'text', '', 'sa');
select xt.add_column('usrlitesip','usrlitesip_name', 'text', '', 'sa');
select xt.add_column('usrlitesip','usrlitesip_password', 'text', '', 'sa');
comment on table sa.usrlitesip is 'Joins User with SIP account';

Breaking it down, line 1 creates the table named usrlitesip (no duh), line 2 is for the primary key (self-explanatory). You can then add any columns you like, just be sure to add one that references the table you want to link to. I checked usrlite.sql and saw the primary key is usr_username, be sure to use the primary key of the table you are referencing. You can check what you made by executing the .sql files like so:

$ cd /path/to/xtuple-extensions/source/sip_account/database/source
$ psql -U admin -d dev -f create_sa_schema.sql
$ psql -U admin -d dev -f usrlitesip.sql

After which you will see the table with the columns you created if you enter:

$ psql -U admin -d dev -c "select * from sa.usrlitesip;"

Now create the file /path/to/xtuple-extensions/source/sip_account/database/source/manifest.js to put the files together and in the right order. It should contain:

 
  "name": "sip_account",
  "version": "1.4.1",
  "comment": "Sip Account extension",
  "loadOrder": 999,
  "dependencies": ["crm"],
  "databaseScripts": [
    "create_sa_schema.sql",
    "usrlitesip.sql",
    "register.sql"
  ]

I think the name has to be the same you named your extension directory as in /path/to/xtuple-extensions/source/<YOUR EXTENSION NAME>. I think the comment can be anything you like and you want your loadOrder to be high so it s the last thing installed (as it s an add on.) So far we are doing exactly what s instructed in the xTuple tutorial. It s repetitive, but I think you can never have too many examples to compare to. In databaseScripts you will list the two .sql files you just created for the schema and the table, plus another file to be made in the same directory named register.sql. I m not sure why you have to make the register.sql or even if you indeed have to. If you leave the file empty, there will be a build error, so put a ; in the register.sql or remove the line register.sql from manifest.js as I think for now we are good without it. Now let s update the database with our new extension:

$ cd /path/to/xtuple
$ ./scripts/build_app.js -d dev -e ../xtuple-extensions/source/sip_account
$ psql -U admin -d dev -c "select * from xt.ext;"

That last command should display a table with a list of extensions; the ones already in xtuple like crm and billing and some others plus your new extension, in this case sip_account . When you run build_app.js you ll probably see a message along the lines of <Extension name> has no client code, not building client code and that s fine because yeah, we haven t worked on the client code yet. ORM Here s where things start getting different. So ORMs link your object to an SQL table. But we DON T want to make a new business object, we want to extend an existing business object, so the ORM we will make will be a little different than the xTuple tutorial. Steve Hackbarth kindly explained this new business object/existing business object ORM concept here. First we ll create the directory /path/to/xtuple-extensions/source/sip_account/database/orm/ext, according to xTuple convention. ORMs for new business objects would be put in /path/to/xtuple-extensions/source/sip_account/database/orm/models. Now we ll create the .json file /path/to/xtuple-extensions/source/sip_account/database/orm/ext/user_account.jscon for our ORM. Once again, don t quote me on this, but I think the name of the file should be the name of the business object you are extending, as is done in the turorial example extending the Contact object. In our case, UserAccount is defined in user_account.json and that s what I named my extension ORM too. Here s what you should place in it:

 1 [
 2    
 3     "context": "sip_account",
 4     "nameSpace": "XM",
 5     "type": "UserAccount",
 6     "table": "sa.usrlitesip",
 7     "isExtension": true,
 8     "isChild": false,
 9     "comment": "Extended by Sip",
10     "relations": [
11        
12         "column": "usrlitesip_usr_username",
13         "inverse": "username"
14        
15     ],
16     "properties": [
17        
18         "name": "uri",
19         "attr":  
20           "type": "String",
21           "column": "usrlitesip_uri",
22           "isNaturalKey": true
23          
24        ,
25        
26         "name": "displayName",
27         "attr":  
28           "type": "String",
29           "column": "usrlitesip_name"
30          
31        ,
32        
33         "name": "sipPassword",
34         "attr":  
35           "type": "String",
36           "column": "usrlitesip_password"
37          
38        
39     ],
40     "isSystem": true
41    ,
42    
43     "context": "sip_account",
44     "nameSpace": "XM",
45     "type": "UserAccountRelation",
46     "table": "sa.usrlitesip",
47     "isExtension": true,
48     "isChild": false,
49     "comment": "Extended by Sip",
50     "relations": [
51        
52         "column": "usrlitesip_usr_username",
53         "inverse": "username"
54        
55     ],
56     "properties": [
57        
58         "name": "uri",
59         "attr":  
60           "type": "String",
61           "column": "usrlitesip_uri",
62           "isNaturalKey": true
63          
64        ,
65        
66         "name": "displayName",
67         "attr":  
68           "type": "String",
69           "column": "usrlitesip_name"
70          
71        ,
72        
73         "name": "sipPassword",
74         "attr":  
75           "type": "String",
76           "column": "usrlitesip_password"
77          
78        
79     ],
80     "isSystem": true
81    
82 ]

Note the context is my extension name, because the context + nameSpace + type combo has to be unique. We already have a UserAccount and UserAccountRelation object in the XM namespace in the xtuple context in the original user_account.json, now we will have a UserAccount and UserAccountRelation object in the XM namespace in the sip_account conext. What else is important? Note that isExtension is true on lines 7 and 47 and the relations item contains the column of the foreign key we referenced. This is something you might want to verify: column (lines 12 and 52) is the name of the attribute on your table. When we made a reference to the primary key usr_usrname from the xt.usrlite table we named that column usrlitesip_usr_usrname. But the inverse is the attribute name associated with the original sql column in the original ORM. Did I lose you? I had a lot of trouble with this silly thing. In the original ORM that created a new UserAccount business object, the primary key attribute is named username , as can be seen here. That is what should be used for the inverse value. Not the sql column name (usr_username) but the object attribute name (username). I m emphasizing this because I made that mistake and if I can spare you the pain I will. If we rebuild our extension everything should come along nicely, but you won t see any changes just yet in the web app because we haven t created the client code. Client Create the directory /path/to/xtuple-extensions/source/sip_account/client which is where we ll keep all the client code. Extend Workspace View I want the fields I added to show up on the form to create a new User Account, so I need to extend the view for the User Account workspace. I ll start by creating a directory /path/to/xtuple-extensions/source/sip_account/client/views and in it creating a file named workspace.js containing this code:

XT.extensions.sip_account.initWorkspace = function ()  
	var extensions = [
  	 kind: "onyx.GroupboxHeader", container: "mainGroup", content: "_sipAccount".loc() ,
  	 kind: "XV.InputWidget", container: "mainGroup", attr: "uri"  ,
  	 kind: "XV.InputWidget", container: "mainGroup", attr: "displayName"  ,
  	 kind: "XV.InputWidget", container: "mainGroup", type:"password", attr: "sipPassword"  
	];
	XV.appendExtension("XV.UserAccountWorkspace", extensions);
 ;

So I m initializing my workspace and creating an array of items to add (append) to view XV.UserAccountWorkspace. The first item is this onyx.GroupboxHeader which is a pretty divider for my new form fields, the kind you find in the web app at Setup > User Accounts, like Overview . I have no idea what other options there are for container other than mainGroup , so let s stick to that. I ll explain content: _sipAccount .loc() in a bit. Next I created three input fields of the XV.InputWidget kind. This also confused me a bit as there are different kinds of input to be used, like dropdowns and checkboxes. The only advice I can give is snoop around the webapp, find an input you like and look up the corresponding workspace.js file to see what was used. What we just did is (should be) enough for the new fields to show up on the User Account form. But before we see things change, we have to package the client. Create the file /path/to/xtuple-extensions/source/sip_account/client/views/package.js. This file is needed to package groups of files and indicates the order the files should be loaded (for more on that, see this). For now, all the file will contain is:

enyo.depends(
"workspace.js"
);

You also need to package the views directory containing workspace.js, so create the file Create the file /path/to/xtuple-extensions/source/sip_account/client/package.js and in it show that the directory views and its contents must be part of the higher level package:

enyo.depends(
"views"
);

I like to think of it as a box full of smaller boxes. This will sound terrible, but apparently you also need to create the file /path/to/xtuple-extensions/source/sip_account/client/core.js containing this line:

XT.extensions.icecream =  ;

I don t know why. As soon as I find out I ll be sure to inform you. As we ve added a file to the client directory, be sure to update /path/to/xtuple-extensions/source/sip_account/client/package.js so it included the new file:

enyo.depends(
"core.js",
"views"
);

Translations Remember _sipAccount .loc() in our workspace.js file? xTuple has great internationalization support and it s easy to use. Just create the directory and file /path/to/xtuple-extensions/source/sip_account/client/en/strings.js and in it put key-value pairs for labels and their translation, like this:

(function ()  
  "use strict";
  var lang = XT.stringsFor("en_US",  
    "_sipAccount": "Sip Account",
    "_uri": "Sip URI",
    "_displayName": "Display Name",
    "_sipPassword": "Password"
   );
  if (typeof exports !== 'undefined')  
    exports.language = lang;
   
 ());

So far I included all the labels I used in my Sip Account form. If you write the wrong label (key) or forget to include a corresponding key-value pair in strings.js, xTuple will simply name your lable _lableName , underscore and all. Now build your extension and start up the server:

$ cd /path/to/xtuple 
$ ./scripts/build_app.js -d dev -e ../xtuple-extensions/source/sip_account
$ node node-datasource/main.js

If the server is already running, just stop it and restart it to reflect your changes. Now if you go to Setup > User Accounts and click the + button, you should see a nice little addition to the form with a Sip Account divider and three new fields. Nice, eh? Extend Parameters Currently you can search your User Accounts list using any of the User Account fields. It would be nice to be able to search with the Sip account fields we added as well. To do that, let s create the directory /path/to/xtuple-extensions/source/sip_account/client/widgets and there create the file parameter.js to extend XV.UserAccountListParameters. One again, you ll have to look this up. In the xTuple code you ll find the application s parameter.js in /path/to/xtuple/enyo-client/application/source/widgets. Search for the business object you are extending (for example, XV.UserAccount) and look for some combination of the business object name and Parameters . If there s more than one, try different ones. Not a very refined method, but it worked for me. Here s the content of our parameter.js:

XT.extensions.sip_account.initParameterWidget = function ()  
  var extensions = [
     kind: "onyx.GroupboxHeader", content: "_sipAccount".loc() ,
     name: "uri", label: "_uri".loc(), attr: "uri", defaultKind: "XV.InputWidget" ,
     name: "displayName", label: "_displayName".loc(), attr: "displayName", defaultKind: "XV.InputWidget" 
  ];
  XV.appendExtension("XV.UserAccountListParameters", extensions);
 ;

Node that I didn t include a search field for the password attribute for obvious reasons. Now once again, we package this new code addition by creating a /path/to/xtuple-extensions/source/sip_account/client/widgets/package.js file:

enyo.depends(
"parameter.js"
);

We also have to update /path/to/xtuple-extensions/source/sip_account/client/package.js:

enyo.depends(
"core.js",
"widgets"
"views"
);

Rebuild the extension (and restart the server) and go to Setup > User Accounts. Press the magnifying glass button on the upper left side of the screen and you ll see many options for filtering the User Accounts, among them the SIP Uri and Display Name. Extend List View You might want your new fields to show up on the list of User Accounts. There s a bit of an issue here because unlike what we did in workspace.js and parameter.js, we can t append new things to the list of UserAccounts with the funciton XV.appendExtension(args). First I tried overwriting the original UserAccountList, which works but it s far from ideal as this could result in a loss of data from the core implementation. After some discussion with the xTuple dev community, now there s a better alternative: Create the file /path/to/xtuple-extensions/source/sip_account/client/views/list.js and add the following:

1 var oldUserAccountListCreate = XV.UserAccountList.prototype.create;
2 
3 XV.UserAccountList.prototype.create = function ()  
4   oldUserAccountListCreate.apply(this, arguments);
5   this.createComponent(
6    kind: "XV.ListColumn", container: this.$.listItem, components: [
7         kind: "XV.ListAttr", attr: "uri" 
8    ] )
9  ;

To understand what I m doing, check out the XV.UserAccountList implementation in /path/to/xtuple/enyo-client/application/source/views/list.js the entire highlighted part. What we are doing is extending XV.UserAccountList through prototype-chaining ; this is how inheritance works with Enyo. In line 1 we create a prototype and in line 4 we inherit the features including original components array which the list is based on. We then create an additional component immitating the setup shown in XV.UserAccountList: An XV.ListColumn containing an XV.ListAttr, which should be placed in the XV.ListItem components array as is done with the existing columns (refer to implementation). Components can or should (?) have names which are used to access said components. You d refer to a specific component by the this.$.componentName hash. The components in XV.UserAccountList don t have names, so Enyo automatically names them (apparently) based on the kind name, for example something of the kind ListItem is named listItem. I found this at random after a lot of trial and error and it s not a bullet proof solution. Can be bettered. It s strange because if you encapsulate that code with

XT.extensions.sip_account.initList = function ()  
 //Code here
 ;

as is done with parameter.js and workspace.js (and in the xTuple tutorial you are supposed to do that with a new business object), it doesn t work. I have no idea why. This might be wrong or against xTuple coding norms; I will find out and update this post ASAP. But it does work this way. * shrugs * That said, as we ve created the list.js file, we need to ad it to our package by editing /path/to/xtuple-extensions/source/sip_account/client/views/package.js:

enyo.depends(
"list.js",
"workspace.js"
);

That s all. Rebuild the app and restart your server and when you select Setup > User Accounts in the web app you should see the Sip URI displayed on the User Accounts that have the Sip Account data. Add a new User Account to try this out.

6 July 2014

Dominique Dumont: Status and next step on lcdproc automatic configuration upgrade with Perl and Config::Model

Back in March, I uploaded a Debian s version of lcdproc with a unique feature: user and maintainer configurations are merged during package upgrade: user customizations and developers enhancements are both preserved in the new configuration file. (See this blog for more details). This avoids tedious edition of the configuration LCDd.conf file after every upgrade of lcdproc package. At the beginning of June, a new version of lcdproc (0.5.7-1) was uploaded. This triggered another round of automatic upgrades on user s systems. According to the popcon rise of libconfig-model-lcdproc-perl, about 100 people have upgraded lcdproc on their system. Since automatic upgrade has an opt-out feature, one cannot say for sure that 100 people are actually using automatic upgrade, but I bet a fair portion are them are. So far, only one people has complained: a bug report was filed about the many dependencies brought by libconfig-model-lcdproc-perl. The next challenge for lcdproc configuration upgrade is brought by a bug reported on Ubuntu: the device file provided by imon kernel module is a moving target: The device file created by the kernel can be /dev/lcd0 or /dev/lcd1 or even /dev/lcd2. Static configuration files and moving target don t mix well. The obvious solution is to provide a udev rule so that a symbolic link is created from a fixed location (/dev/lcd-imon) to the moving target. Once the udev rule is installed, the user only has to update LCDd.conf file to use the symlink as imon device file and we re done. But, wait The whole point of automatic configuration upgrade is to spare the user this kind of trouble: the upgrade must be completely automatic. Moreover, the upgrade must work in all cases: whether udev is available (Linux) or not. If udev is not available, the value present in the configuration file must be preserved. To know whether udev is available, the upgrade tool (aka cme) will check whether the file provided by udev (/dev/lcd-imon) is present or not. This will be done by lcdproc postinst script (which is run automatically at the end of lcdproc upgrade). Which means that the new udev rule must also be
activated in the postinst script before the upgrade is done. In other words, the next version of lcdproc (0.5.7-2) will:

Install a new udev rule to provide lcd-imon symbolic link
Activate this rule in lcdproc postinst script before upgrading the configuration (note to udev experts: yes, the udev rule is activated with --action=change option)
Upgrade the configuration by running cme migrate in lcdproc postinst script.

In the lcdproc configuration model installed by libconfig-model-lcdproc-perl, the imon device parameter is enhanced so that running cme check lcdproc or cme migrate lcdproc issues a warning if /dev/lcd-imon exists and if imon driver is not configured to use it. This way, the next installation of lcdproc will deliver a fix for imon and cme will fix user s configuration file without requiring user input. The last point is admittedly bad marketing as users will not be aware of the magic performed by Config::Model Oh well In the previous section, I ve briefly mentioned that imon_device parameter is enhanced in lcdproc configuration model. If you re not already bored, let s lift the hood and see what kind of enhancements was added. Let s peek in lcdproc configuration file, LCDd.conf file which is used to generate lcdproc configuration model. You may remember that the formal description of all LCDd.conf parameters and their properties is generated from LCDd.conf to provide lcdproc configuration model. The comments in LCDd.conf follow a convention so that most properties of the parameters can be extracted from the comments. In the example below, the comments show that NewFirmware is a boolean value expressed as yes or no, the latter being the default :

# Set the firmware version (New means >= 2.0) [default: no; legal: yes, no]
NewFirmware=no

Back to the moving target. In LCDd.conf, imon device file parameter is declared this way:

# Select the output device to use
Device=/dev/lcd0

This means that device is a string where the default value is /dev/lcd0. Which is wrong once the special udev rule provided with Debian packages is activated. With this rule, the default value must be /dev/lcd-imon. To fix this problem, a special comment is added in the Debian version of LCDd.conf to tune further the properties of the device parameter:

# select the device to use
#  %
#   default~
#   compute
#     use_eval=1
#     formula="my $l = '/dev/lcd-imon'; -e $l ? $l : '/dev/lcd0';"
#     allow_override=1 -
#   warn_if:not_lcd_imon
#     code="my $l = '/dev/lcd-imon';defined $_ and -e $l and $_ ne $l ;"
#     msg="imon device does not use /dev/lcd-imon link."
#     fix="$_ = undef;"
#   warn_unless:found_device_file
#     code="defined $_ ? -e : 1"
#     msg="missing imon device file"
#     fix="$_ = undef;"
#   - % 
Device=/dev/lcd0

This special comment between % and % follows the syntax of Config::Model::Loader. A small configuration model is declared there to enhance the model generated from LCDd.conf file. Here are the main parts:

default~ suppress the default value of the device parameter declared in the original LCDd.conf (i.e. /dev/ldcd0 )
compute and the 3 lines below computes a default value for the device file. Since use_eval is true, the formula is evaluated as Perl code. This code will return /dev/lcd-imon if this file is found. Otherwise, /dev/lcd0 is returned. Hence, either /dev/lcd-imon or /dev/lcd0 will be used a as default value. allow_override=1 lets the user override this computed value
warn_if and the 3 lines below test the configured device file with the Perl instructions provided by the code parameter. There, the device value is available in the $_ variable. This code will return true if /dev/lcd-imon exists and if the configured device does not use it. This will trigger a warning that will show the specified message.
Similarly warn_unless and the 3 lines below warns the user if the configured device file is not found.

In both warn_unless and warn_if parts, the fix code snippet is run when by the command cme fix lcdproc and is used to repair the warning condition. In this case, the fix consists in resetting the device configuration value so the computed value above can be used. cme fix lcdproc is triggered during package post install script installed by dh_cme_upgrade. Come to think of it, generating a configuration model from a configuration file can probably be applied to other projects: for instance, php.ini and kdmrc are also shipped with detailed comments. May be I should make a more generic model generator from the example used to generate lcdproc model Well, I will do it if people show interest. Not in the form yeah, that would be cool , but in the form, yes, I will use your work to generate a configuration model for project [...] . I ll let you fill the blank ;-)
Tagged: Config::Model, configuration, debian, lcdproc, Perl, upgrade

5 June 2014

Dirk Eddelbuettel: RcppArmadillo 0.4.300.8.0

A new minor / bug fix release 4.300.8 of Armadillo, the templated C++ library for linear algebra, was tagged by Conrad in his SVN repo a few days ago, following earlier snapshots in the 4.300.* series. We had prepared two earlier releases for GitHub but not CRAN in order to accomodate the CRAN maintainer's desire of "a release every one to two months" expressed in the CRAN Repo Policy. However, two actual bugs in the interaction between MinGW and C++11 were reported on the rcpp-devel mailing list, and this releases addresses these. Hence time for new release 0.4.300.8.0 of RcppArmadillo which is now on CRAN and in Debian. This release brings a few upstream changes detailed below such as nice new upstream changes such as more robust norm() (and related) functions, and fixes related to matrix and cube interactions. From our end, we added a better detection of Windows via both _WIN32 and WIN32 (as the former apparently gets undefined by MinGW in C++11 mode). We also added the ability to turn on C++11 support from R (possible since R 3.1.0) yet also turn it off for Armadillo. This is needed as the prescribed compiler on Windows is g++ 4.6.2 -- which offers a subset of C++11 which is good enough for a number of things from the C++11 standard, but not advanced enough for everything which Armadillo uses when C++11 support is turned on. As Armadillo continues to offer a choice of C++ standards, we can use the ability to deploy C++11 only outside of its internals. It is worth repeating that this issue should only affect Windows users wishing to use C++11; other platforms are fine as they generally have more modern compilers.

Changes in RcppArmadillo version 0.4.300.8.0 (2014-05-31)

Upgraded to Armadillo release Version 4.300.8 (Medieval Cornea Scraper)

More robust norm-related functions

Fixes between interactions between cube and vector types.

Adds a #define ARMA_DONT_USE_CXX11 to provide an option to turn C++11 off for Armadillo (but client packages may still use it)

More robust Windows detection by using _WIN32 as well as WIN32 as the latter gets diabled by MinGW with C++11

On Windows, C++11 can be turned off as the Armadillo code base uses more features of C++11 than g++ 4.6.2 version in Rtools implements

Changes in RcppArmadillo version 0.4.300.5.0 (2014-05-19)

Upgraded to Armadillo release Version 4.300.5 (Medieval Cornea Scraper)

Handle possible underflows and overflows in norm(), normalise(), norm_dot()

Fix for handling of null vectors by norm_dot()

Changes in RcppArmadillo version 0.4.300.2.0 (2014-05-13)

Upgraded to Armadillo release Version 4.300.2 (Medieval Cornea Scraper)

faster find()

Courtesy of CRANberries, there is also a diffstat report for the most recent release As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

28 April 2014

Evgeni Golov: Debian Bug Squashing Party Salzburg 2014

This weekend, Bernd Zeimetz organized a BSP at the offices of conova in Salzburg, Austria. Three days of discussions, bugfixes, sparc removals and a lot of fun and laughter. We squashed a total of 87 bugs: 66 bugs affecting Jessie/Sid were closed, 9 downgraded and 8 closed via removals. As people tend to care about (old)stable, 3 bugs were fixed in Wheezy and one in Squeeze. These numbers might be not totaly correct, as were kinda creative at counting Marga promised a talk about an introduction to properly counting bugs using the Haus vom Nikolaus algorithm to the base of 7 .

Speaking of numbers, I touched the following bugs (not all RC):

#741806: pygresql: FTBFS: pgmodule.c:32:22: fatal error: postgres.h: No such file or directory
Uploaded an NMU with a patch. The bug was introduced by the recent PostgreSQL development package reorganisation.
#744229: qpdfview: FTBFS synctex/synctex_parser.c:275:20: fatal error: zlib.h: No such file or directory
Talked to the maintainer, explaining the importance of the upload and verifying his fix.
#744300: pexpect: missing dependency on dh-python
Downgraded to wishlist after verifying the build dependency is only needed when building for Wheezy backports.
#744917: luajit: FTBFS when /sbin is not in $PATH
Uploaded an NMU with a patch, which later was canceled due to a maintainer upload with a slightly different fix.
#742943: nagios-plugins-contrib: check_raid: wants mpt-statusd / mptctl
Analyzed the situation, verified the status with the latest upstream version of the ckeck and commented on the bug.
#732110: nagios-plugins-contrib: check_rbl error when nameserver available only in IPv6
Verify that the bug is fixed in the latest release and mark it as done.
#684726: nagios-plugins-contrib: RFP: check-v46 Icinga / Nagios plugin for dual stacked (IPv4 / IPv6) hosts
Mark bug as done, the changelog was missing a proper Closes tag.
#661167: nagios-plugins-contrib: please include nagios-check-printer-status
Mark bug as done, the changelog was missing a proper Closes tag.
#745895: nagios-plugins-contrib: does not compile against Varnish 4.0.0
Write a patch for supporting the Varnish 3 and 4 APIs at the same time. Also proposed the patch upstream.
#744922: nagios-plugins-contrib: check_packages: check for security updates broken
Forward our security_updates_critical patch and Felix fixes to it upstream, then updating check_packages to the latest upstream version.
#744248: nagios-plugins-contrib: check_cert_expire: support configurable warn/crit times
Forward Helmut s patch upstream, then updating check_cert_expire to the latest upstream version.
#745691: django-classy-tags: FTBFS: Sphinx documentation not found
Analyze the issue and the fix proposed in SVN, comment on the bug.
#713876: thinkfan: [PATCH] bugfix: use $MAINPID for ExecReload
Prepare and upload of the latest upstream release, which includes Michael s patch.
#713878: thinkfan: [PATCH] use dh-systemd for proper systemd-related maintscripts
Apply Michael s patch to the Debian packaging.
#728087: thinkfan: Document how to start thinkfan with systemd
Apply Michael s patch to the Debian packaging.
#742515: blktap-dkms: blktapblktap kernel module failed to build
Upload an NMU with a patch based on the upstream fix.
#745598: libkolab: FTBFS in dh_python2 (missing Build-Conflicts?)
Upload an NMU with a patch against libkolab s cmake rules, tightening the search for Python to 2.7.
#745599: libkolabxml: FTBFS with undefined reference to symbol _ZTVN5boost6detail16thread_data_baseE
Upload an NMU with a patch against libkolabxml s cmake rules, properly linking the tests to the Boost libraries.
#746160: libcolabxml: FTBFS when both python2 and python3 development headers are installed
Filling the bug while working on #745599, then uploading an NMU with a patch against libkolabxml s cmake rules, tightening the search for Python to 2.7.
#714045: blcr-dkms: blcr module is not built on kernel 3.9.1
Checking the status of the bug upstream, and marking it as forwarded.
#653404: hdapsd: init.d status support
Added Peter s patch to the Debian packaging, upload yet pending.
#702199: hdapsd: Typo in package description
Fixed typo in the description, upload yet pending.
#745219: crmsh: should depends on python-yaml
Verifying the bug with Stefan Bauer and sponsoring his NMU.
#741600: 389-ds-base: CVE-2014-0132
Sponsoring the NMU for Tobias Frost.

A couple of (non-free) pictures are available at Uwe s salzburg-cityguide.at. Thanks again to Bernd for organizing and conova and credativ for sponsoring!

15 April 2014

Colin Watson: Porting GHC: A Tale of Two Architectures

We had some requests to get GHC (the Glasgow Haskell Compiler) up and running on two new Ubuntu architectures: arm64, added in 13.10, and ppc64el, added in 14.04. This has been something of a saga, and has involved rather more late-night hacking than is probably good for me. Book the First: Recalled to a life of strange build systems You might not know it from the sheer bulk of uploads I do sometimes, but I actually don't speak a word of Haskell and it's not very high up my list of things to learn. But I am a pretty experienced build engineer, and I enjoy porting things to new architectures: I'm firmly of the belief that breadth of architecture support is a good way to shake out certain categories of issues in code, that it's worth doing aggressively across an entire distribution, and that, even if you don't think you need something now, new requirements have a habit of coming along when you least expect them and you might as well be prepared in advance. Furthermore, it annoys me when we have excessive noise in our build failure and proposed-migration output and I often put bits and pieces of spare time into gardening miscellaneous problems there, and at one point there was a lot of Haskell stuff on the list and it got a bit annoying to have to keep sending patches rather than just fixing things myself, and ... well, I ended up as probably the only non-Haskell-programmer on the Debian Haskell team and found myself fixing problems there in my free time. Life is a bit weird sometimes. Bootstrapping packages on a new architecture is a bit of a black art that only a fairly small number of relatively bitter and twisted people know very much about. Doing it in Ubuntu is specifically painful because we've always forbidden direct binary uploads: all binaries have to come from a build daemon. Compilers in particular often tend to be written in the language they compile, and it's not uncommon for them to build-depend on themselves: that is, you need a previous version of the compiler to build the compiler, stretching back to the dawn of time where somebody put things together with a big magnet or something. So how do you get started on a new architecture? Well, what we do in this case is we construct a binary somehow (usually involving cross-compilation) and insert it as a build-dependency for a proper build in Launchpad. The ability to do this is restricted to a small group of Canonical employees, partly because it's very easy to make mistakes and partly because things like the classic "Reflections on Trusting Trust" are in the backs of our minds somewhere. We have an iron rule for our own sanity that the injected build-dependencies must themselves have been built from the unmodified source package in Ubuntu, although there can be source modifications further back in the chain. Fortunately, we don't need to do this very often, but it does mean that as somebody who can do it I feel an obligation to try and unblock other people where I can. As far as constructing those build-dependencies goes, sometimes we look for binaries built by other distributions (particularly Debian), and that's pretty straightforward. In this case, though, these two architectures are pretty new and the Debian ports are only just getting going, and as far as I can tell none of the other distributions with active arm64 or ppc64el ports (or trivial name variants) has got as far as porting GHC yet. Well, OK. This was somewhere around the Christmas holidays and I had some time. Muggins here cracks his knuckles and decides to have a go at bootstrapping it from scratch. It can't be that hard, right? Not to mention that it was a blocker for over 600 entries on that build failure list I mentioned, which is definitely enough to make me sit up and take notice; we'd even had the odd customer request for it. Several attempts later and I was starting to doubt my sanity, not least for trying in the first place. We ship GHC 7.6, and upgrading to 7.8 is not a project I'd like to tackle until the much more experienced Haskell folks in Debian have switched to it in unstable. The porting documentation for 7.6 has bitrotted more or less beyond usability, and the corresponding documentation for 7.8 really isn't backportable to 7.6. I tried building 7.8 for ppc64el anyway, picking that on the basis that we had quicker hardware for it and didn't seem likely to be particularly more arduous than arm64 (ho ho), and I even got to the point of having a cross-built stage2 compiler (stage1, in the cross-building case, is a GHC binary that runs on your starting architecture and generates code for your target architecture) that I could copy over to a ppc64el box and try to use as the base for a fully-native build, but it segfaulted incomprehensibly just after spawning any child process. Compilers tend to do rather a lot, especially when they're built to use GCC to generate object code, so this was a pretty serious problem, and it resisted analysis. I poked at it for a while but didn't get anywhere, and I had other things to do so declared it a write-off and gave up. Book the Second: The golden thread of progress In March, another mailing list conversation prodded me into finding a blog entry by Karel Gardas on building GHC for arm64. This was enough to be worth another look, and indeed it turned out that (with some help from Karel in private mail) I was able to cross-build a compiler that actually worked and could be used to run a fully-native build that also worked. Of course this was 7.8, since as I mentioned cross-building 7.6 is unrealistically difficult unless you're considerably more of an expert on GHC's labyrinthine build system than I am. OK, no problem, right? Getting a GHC at all is the hard bit, and 7.8 must be at least as capable as 7.6, so it should be able to build 7.6 easily enough ... Not so much. What I'd missed here was that compiler engineers generally only care very much about building the compiler with older versions of itself, and if the language in question has any kind of deprecation cycle then the compiler itself is likely to be behind on various things compared to more typical code since it has to be buildable with older versions. This means that the removal of some deprecated interfaces from 7.8 posed a problem, as did some changes in certain primops that had gained an associated compatibility layer in 7.8 but nobody had gone back to put the corresponding compatibility layer into 7.6. GHC supports running Haskell code through the C preprocessor, and there's a __GLASGOW_HASKELL__ definition with the compiler's version number, so this was just a slog tracking down changes in git and adding #ifdef-guarded code that coped with the newer compiler (remembering that stage1 will be built with 7.8 and stage2 with stage1, i.e. 7.6, from the same source tree). More inscrutably, GHC has its own packaging system called Cabal which is also used by the compiler build process to determine which subpackages to build and how to link them against each other, and some crucial subpackages weren't being built: it looked like it was stuck on picking versions from "stage0" (i.e. the initial compiler used as an input to the whole process) when it should have been building its own. Eventually I figured out that this was because GHC's use of its packaging system hadn't anticipated this case, and was selecting the higher version of the ghc package itself from stage0 rather than the version it was about to build for itself, and thus never actually tried to build most of the compiler. Editing ghc_stage1_DEPS in ghc/stage1/package-data.mk after its initial generation sorted this out. One late night building round and round in circles for a while until I had something stable, and a Debian source upload to add basic support for the architecture name (and other changes which were a bit over the top in retrospect: I didn't need to touch the embedded copy of libffi, as we build with the system one), and I was able to feed this all into Launchpad and watch the builders munch away very satisfyingly at the Haskell library stack for a while. This was all interesting, and finally all that work was actually paying off in terms of getting to watch a slew of several hundred build failures vanish from arm64 (the final count was something like 640, I think). The fly in the ointment was that ppc64el was still blocked, as the problem there wasn't building 7.6, it was getting a working 7.8. But now I really did have other much more urgent things to do, so I figured I just wouldn't get to this by release time and stuck it on the figurative shelf. Book the Third: The track of a bug Then, last Friday, I cleared out my urgent pile and thought I'd have another quick look. (I get a bit obsessive about things like this that smell of "interesting intellectual puzzle".) slyfox on the #ghc IRC channel gave me some general debugging advice and, particularly usefully, a reduced example program that I could use to debug just the process-spawning problem without having to wade through noise from running the rest of the compiler. I reproduced the same problem there, and then found that the program crashed earlier (in stg_ap_0_fast, part of the run-time system) if I compiled it with +RTS -Da -RTS. I nailed it down to a small enough region of assembly that I could see all of the assembly, the source code, and an intermediate representation or two from the compiler, and then started meditating on what makes ppc64el special. You see, the vast majority of porting bugs come down to what I might call gross properties of the architecture. You have things like whether it's 32-bit or 64-bit, big-endian or little-endian, whether char is signed or unsigned, that sort of thing. There's a big table on the Debian wiki that handily summarises most of the important ones. Sometimes you have to deal with distribution-specific things like whether GL or GLES is used; often, especially for new variants of existing architectures, you have to cope with foolish configure scripts that think they can guess certain things from the architecture name and get it wrong (assuming that powerpc* means big-endian, for instance). We often have to update config.guess and config.sub, and on ppc64el we have the additional hassle of updating libtool macros too. But I've done a lot of this stuff and I'd accounted for everything I could think of. ppc64el is actually a lot like amd64 in terms of many of these porting-relevant properties, and not even that far off arm64 which I'd just successfully ported GHC to, so I couldn't be dealing with anything particularly obvious. There was some hand-written assembly which certainly could have been problematic, but I'd carefully checked that this wasn't being used by the "unregisterised" (no specialised machine dependencies, so relatively easy to port but not well-optimised) build I was using. A problem around spawning processes suggested a problem with SIGCHLD handling, but I ruled that out by slowing down the first child process that it spawned and using strace to confirm that SIGSEGV was the first signal received. What on earth was the problem? From some painstaking gdb work, one thing I eventually noticed was that stg_ap_0_fast's local stack appeared to be being corrupted by a function call, specifically a call to the colourfully-named debugBelch. Now, when IBM's toolchain engineers were putting together ppc64el based on ppc64, they took the opportunity to fix a number of problems with their ABI: there's an OpenJDK bug with a handy list of references. One of the things I noticed there was that there were some stack allocation optimisations in the new ABI, which affected functions that don't call any vararg functions and don't call any functions that take enough parameters that some of them have to be passed on the stack rather than in registers. debugBelch takes varargs: hmm. Now, the calling code isn't quite in C as such, but in a related dialect called "Cmm", a variant of C-- (yes, minus), that GHC uses to help bridge the gap between the functional world and its code generation, and which is compiled down to C by GHC. When importing C functions into Cmm, GHC generates prototypes for them, but it doesn't do enough parsing to work out the true prototype; instead, they all just get something like extern StgFunPtr f(void);. In most architectures you can get away with this, because the arguments get passed in the usual calling convention anyway and it all works out, but on ppc64el this means that the caller doesn't generate enough stack space and then the callee tries to save its varargs onto the stack in an area that in fact belongs to the caller, and suddenly everything goes south. Things were starting to make sense. Now, debugBelch is only used in optional debugging code; but runInteractiveProcess (the function associated with the initial round of failures) takes no fewer than twelve arguments, plenty to force some of them onto the stack. I poked around the GCC patch for this ABI change a bit and determined that it only optimised away the stack allocation if it had a full prototype for all the callees, so I guessed that changing those prototypes to extern StgFunPtr f(); might work: it's still technically wrong, not least because omitting the parameter list is an obsolescent feature in C11, but it's at least just omitting information about the parameter list rather than actively lying about it. I tweaked that and ran the cross-build from scratch again. Lo and behold, suddenly I had a working compiler, and I could go through the same build-7.6-using-7.8 procedure as with arm64, much more quickly this time now that I knew what I was doing. One upstream bug, one Debian upload, and several bootstrapping builds later, and GHC was up and running on another architecture in Launchpad. Success! Epilogue There's still more to do. I gather there may be a Google Summer of Code project in Linaro to write proper native code generation for GHC on arm64: this would make things a good deal faster, but also enable GHCi (the interpreter) and Template Haskell, and thus clear quite a few more build failures. Since there's already native code generation for ppc64 in GHC, getting it going for ppc64el would probably only be a couple of days' work at this point. But these are niceties by comparison, and I'm more than happy with what I got working for 14.04. The upshot of all of this is that I may be the first non-Haskell-programmer to ever port GHC to two entirely new architectures. I'm not sure if I gain much from that personally aside from a lot of lost sleep and being considered extremely strange. It has, however, been by far the most challenging set of packages I've ported, and a fascinating trip through some odd corners of build systems and undefined behaviour that I don't normally need to touch.

6 January 2014

Russ Allbery: remctl 3.7

remctl is the middleware layer that we use everywhere at Stanford. It's a simple GSS-API-authenticated network service that supports running commands with ACLs. There are client bindings available for a wide variety of programming languages. This release fixes a couple of irritating bugs: the client library leaked memory when remctl_set_ccache was used (which was affecting mod_webkdc from WebAuth), and Net::Remctl::Backend didn't validate argument counts correctly when one of the arguments came from standard input (which affected krb5-sync). I also worked around a bug in RHEL 5's Module::Build, and added sanity checking to Net::Remctl and related classes to ensure that the object argument wasn't undef. Also new in this release are support in the remctld server for systemd startup notification and socket activation, and (via the -Z) flag) support for upstart's expect stop synchronization method. This is mostly a "clearing the decks" release in advance of more significant work. The next release will replace the server event loop with libevent, in preparation for further improvements in how the server can handle persistent worker children. You can get the latest release from the remctl distribution page.

18 December 2013

Daniel Pocock: Embedding Python in multi-threaded C++ applications

Embedding Python into other applications to provide a scripting mechanism is a popular practice. Ganglia can run user-supplied Python scripts for metric collection and the Blender project does it too, allowing users to develop custom tools and use script code to direct their animations. There are various reasons people choose Python:

Modern, object orientated style of programming
Interpreted language (no need to compile things, just edit and run the code)
Python has a vast array of modules providing many features such as database and network access, data processing, etc

The bottom line is that the application developer who chooses to embed Python in their existing application can benefit from the product of all this existing code multiplied by the imagination of their users. Enter repro repro is the SIP proxy of the reSIProcate project. reSIProcate is an advanced SIP implementation developed in C++. repro is a multi-threaded process. repro's most serious competitor is the Kamailio SIP proxy. Kamailio has its own bespoke scripting language that it has inherited from the SIP Express Router (SER) family of projects. repro has always been far more rigid in its capabilities than Kamailio. On the other hand, while Kamailio has given users great flexibility, it has also come at a cost: users can easily build configurations that are not valid or may not do what they really intend if they don't understand the intricacies of the SIP protocol. Here is an example of the Kamailio configuration script (from Daniel's excellent blog about building a Skype-like service in less than an hour) Kamailio also has a wide array of plugins for things like database and LDAP access. repro only had embedded bdb and MySQL support. Embedding Python into repro appears to be a quick way to fill many of these gaps and allow users to combine the power of the reSIProcate stack with their own custom routing logic. On the other hand, it is not simply copying the Kamailio scripting solution: rather, it provides a distinctive alternative. Starting the integration Embedding Python is such a popular practice that there is even dedicated documentation on the subject. As well as looking there, I also looked over the example provided by the embedded Python module for Ganglia. Looking over the Ganglia mod_python code I noticed a lot of boilerplate code for reference counting and other tedious activities. Given that reSIProcate is C++ code, I thought I would look for a C++ solution to this and I came across PyCXX. PyCXX is licensed under BSD-like terms similar to reSIProcate itself so it is a good fit. There is also the alternative Boost.Python API, however, reSIProcate has been built without Boost dependencies so I decided to stick with PyCXX. I looked over the PyCXX examples and the documentation and was able to complete a first cut of the embedded Python scripting feature very quickly. Using PyCXX One unusual thing I noticed about PyCXX is that the Debian package, python-cxx-dev does not provide any shared library. Instead, some uncompiled source files are provided and each project using PyCXX must compile them and link them statically itself. Here is how I do that in the Makefile.am for pyroute in repro:

AM_CXXFLAGS = -I $(top_srcdir)
reproplugin_LTLIBRARIES = libpyroute.la
libpyroute_la_SOURCES = PyRoutePlugin.cxx
libpyroute_la_SOURCES += PyRouteWorker.cxx
libpyroute_la_SOURCES += $(PYCXX_SRCDIR)/cxxextensions.c
libpyroute_la_SOURCES += $(PYCXX_SRCDIR)/cxx_extensions.cxx
libpyroute_la_SOURCES += $(PYCXX_SRCDIR)/cxxsupport.cxx
libpyroute_la_SOURCES += $(PYCXX_SRCDIR)/../IndirectPythonInterface.cxx
libpyroute_la_CPPFLAGS = $(DEPS_PYTHON_CFLAGS)
libpyroute_la_LDFLAGS = -module -avoid-version
libpyroute_la_LDFLAGS += $(DEPS_PYTHON_LIBS)
EXTRA_DIST = example.py
noinst_HEADERS = PyRouteWorker.hxx
noinst_HEADERS += PyThreadSupport.hxx

The value PYCXX_SRCDIR must be provided on the configure command line. On Debian, it is /usr/share/python2.7/CXX/Python2 Going multi-threaded My initial implementation simply invoked the Python method from the main routing thread of the repro SIP proxy. This meant that it would only be suitable for executing functions that complete quickly, ruling out the use of any Python scripts that talk to network servers or other slow activities. When the proxy becomes heavily loaded, it is important that it can complete many tasks asynchronously, such as forwarding chat messages between users in real-time. Therefore, it was essential to extend the solution to run the Python scripts in a pool of worker threads. At this point, I had an initial feeling that there may be danger in just calling the Python methods from some other random threads started by my own code. I went to see the manual and I came across this specific documentation about the subject. It looks quite easy, just wrap the call to the user-supplied Python code in something like this:

PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
/* Perform Python actions here. */
result = CallSomeFunction();
/* evaluate result or handle exception */
/* Release the thread. No Python API allowed beyond this point. */
PyGILState_Release(gstate);

Unfortunately, I found that this would not work and that one of two problems occur when using this code:

The thread blocks on the call to PyGILState_Ensure()
The program crashes with a segmentation fault when the call to a Python method was invoked

Exactly which of these outcomes I experienced seemed to depend on whether I tried to explicitly call PyEval_ReleaseThread() from the main thread after doing the Py_Initialize() and other setup tasks. I tried various permutations of using PyGILState_Ensure()PyGILState_Release() and/or PyEval_SaveThread()/PyEval_ReleaseThread() but I always had one of the same problems. The next thing that occurred to me is that maybe PyCXX provides some framework for thread integration: I had a look through the code and couldn't find any reference to the threading functionality from the C API. I went looking for more articles and mailing list discussions and found implementation notes such as this one in Linux Journal and this wiki from the Blender developers. Most of them just appeared to be repeating what was in the manual, with a few subtle differences, but none of this provided an immediate solution. Eventually, I discovered this other blog about concurrency with embedded Python and it suggests something not highlighted in any of the other resources: calling PyThreadState_New(m_interpreterState) in each thread after it starts and before it does anything else. Combining this with the use of PyEval_SaveThread()/PyEval_ReleaseThread() fixed the problem: the use of PyThreadState_New() was not otherwise mentioned in the relevant section of the Python guide. I decided to take this solution a step further and create a convenient C++ class to encapsulate the logic, you can see this in PyThreadSupport.hxx:

class PyExternalUser
 
   public:
      PyExternalUser(PyInterpreterState* interpreterState)
       : mInterpreterState(interpreterState),
         mThreadState(PyThreadState_New(mInterpreterState))  ;
   class Use
    
      public:
         Use(PyExternalUser& user)
          : mUser(user)
           PyEval_RestoreThread(mUser.getThreadState());  ;
         ~Use()   mUser.setThreadState(PyEval_SaveThread());  ;
      private:
         PyExternalUser& mUser;
    ;
   friend class Use;
   protected:
      PyThreadState* getThreadState()   return mThreadState;  ;
      void setThreadState(PyThreadState* threadState)   mThreadState = threadState;  ;
   private:
      PyInterpreterState* mInterpreterState;
      PyThreadState* mThreadState;
 ;

and the way to use it is demonstrated in the PyRouteWorker class. Observe how PyExternalUser::Use is instantiated in the PyRouteWorker::process() method: when it goes out of scope (either due to a normal return, an error or an exception) the necessary call to PyEval_SaveThread() is made in the PyExternalUser::Use::~Use() destructor. Using other Python modules and DSO problems All of the above worked for basic Python such as this trivial example script:

def on_load():
    '''Do initialisation when module loads'''
    print 'example: on_load invoked'
def provide_route(method, request_uri, headers):
    '''Process a request URI and return the target URI(s)'''
    print 'example: method = ' + method
    print 'example: request_uri = ' + request_uri
    print 'example: From = ' + headers["From"]
    print 'example: To = ' + headers["To"]
    routes = list()
    routes.append('sip:bob@example.org')
    routes.append('sip:alice@example.org')
    return routes

However, it needs a more credible and useful test: using the python-ldap module to try and query an LDAP server appears like a good choice. Upon trying to use import ldap in the Python script, repro would refuse to load the Python script, choking on an error like this:

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/ldap/__init__.py", line 22, in 
    import _ldap
ImportError: /usr/lib/python2.7/dist-packages/_ldap.so: undefined symbol: PyExc_SystemError

I looked at the file _ldap.so and discovered that it is linked with the LDAP libraries but not explicitly linked to any version of the Python runtime libraries. It expects the application hosting it to provide the Python symbols globally. In my own implementation, my embedded Python encapsulation code is provide as a DSO plugin, similar to the way plugins are loaded in Ganglia or Apache. The DSO links to Python: the DSO is loaded by a dlopen() call from the main process. The main repro binary has no direct link to Python libraries. Adding RTLD_GLOBAL to the top-level dlopen() call for loading the plugin is one way to ensure the Python symbols are made available to the Python modules loaded indirectly by the Python interpreter. This solution may be suitable for applications that don't mix and match many different components. Doing something useful with it Now it was all working nicely, I took a boilerplate LDAP Python example and used it for making a trivial script that converts sip:user@example.org to something like sip:9001@pbx.example.org, assuming that 9001 is the telephoneNumber associated with the user@ email address in LDAP. It is surprisingly simple and easily adaptable to local requirements depending upon the local LDAP structures:

import ldap
from urlparse import urlparse
def on_load():
    '''Do initialisation when module loads'''
    #print 'ldap router: on_load invoked'
def provide_route(method, request_uri, headers):
    '''Process a request URI and return the target URI(s)'''
    #print 'ldap router: request_uri = ' + request_uri
    _request_uri = urlparse(request_uri)
    routes = list()
    
    # Basic LDAP server parameters:
    server_uri = 'ldaps://ldap.example.org'
    base_dn = "dc=example,dc=org"
    # this domain will be appended to the phone numbers when creating
    # the target URI:
    phone_domain = 'pbx.example.org'
    # urlparse is not great for "sip:" URIs,
    # the user@host portion is in the 'path' element:
    filter = "(&(objectClass=inetOrgPerson)(mail=%s))" % _request_uri.path
    #print "Using filter: %s" % filter
    try:
        con = ldap.initialize(server_uri)
        scope = ldap.SCOPE_SUBTREE
        retrieve_attributes = None
        result_id = con.search(base_dn, scope, filter, retrieve_attributes)
        result_set = []
        while 1:
            timeout = 1
            result_type, result_data = con.result(result_id, 0, None)
            if (result_data == []):
                break
            else:
                if result_type == ldap.RES_SEARCH_ENTRY:
                    result_set.append(result_data)
        if len(result_set) == 0:
            #print "No Results."
            return routes
        for i in range(len(result_set)):
            for entry in result_set[i]:
                if entry[1].has_key('telephoneNumber'):
                    phone = entry[1]['telephoneNumber'][0]
                    routes.append('sip:' + phone + '@' + phone_domain)
    except ldap.LDAPError, error_message:
        print "Couldn't Connect. %s " % error_message
    return routes

Embedded Python opens up a world of possibilities After Ganglia 3.1.0 introduced an embedded Python scripting facility, dozens of new modules started appearing in github. Python scripting lowers the barrier for new contributors to a project and makes it much easier to fine tune free software projects to meet local requirements: hopefully we will see similar trends with the repro SIP proxy and other projects that choose Python. The code is committed here in the reSIProcate repository. These features will appear in the next beta release of reSIProcate and Debian packages will be available in unstable in a few days.

3 December 2013

Brett Parker: dd over ssh oddness

So, using the command:

root@new# ssh root@old dd if=/dev/vg/somedisk dd of=/dev/vg/somedisk

appears to fail, getting a SIGTERM at some point for no discernable reason... however, using

root@old# dd if=/dev/vg/somedisk ssh root@new dd of=/dev/vg/somedisk

works fine. The pull version fails at a fairly random point after a fairly undefined period of time. The push version works everytime. This is most confusing and odd... Dear lazyweb, please give me some new ideas as to what's going on, it's driving me nuts! Update: solved... A different daemon wasn't limiting it's killing habits in the case that a certain process wasn't running, and was killing the ssh process on the new server almost at random, found the bug in the code and now testing with that. Thanks for all the suggestions though, much appreciated.

17 November 2013

Gregor Herrmann: RC bugs 2013/46

not my most active RC bug squashing week but still, a few things done:

~~#713575~~ src:libgcal: "libgcal: FTBFS: ld: /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libcheck_pic.a(check_pack.o): undefined reference to symbol '__pthread_unregister_cancel@@GLIBC_2.3.3'"
add patch from Ubuntu / Andreas Moog, upload to DELAYED/2
~~#713662~~ src:heartbeat: "heartbeat: FTBFS: ucast.c:468:6: error: conflicting types for 'i'"
add patch from Ubuntu / Andres Rodriguez, upload to DELAYED/2
#718068 src:csh: "csh: FTBFS: Makefile:24: *** missing separator. Stop."
propose a different solution
~~#718098~~ src:menhir: "menhir: FTBFS: Makefile:34: *** Please define PREFIX. Stop."
"backport" fix from experimental, upload to DELAYED/2, then rescheduled to 0-day with maintainer's permission
#726410 src:spandsp: "spandsp: FTBFS in parallel mode"
tag unreproducible
~~#727806~~ tunnelx: "tunnelx: Depends on openjdk-6, which is going away"
update Debian-Java-Home in .manifest file; upload to DELAYED/2

10 November 2013

Gregor Herrmann: RC bugs 2013/45

here's the list of RC bugs I've worked on during the last week.

~~#713582~~ src:malaga: "malaga: FTBFS: ld: canvas.o: undefined reference to symbol 'ceil@@GLIBC_2.2.5'"
add patches from Ubuntu / Matthias Klose, upload to DELAYED/2
~~#713595~~ src:goocanvas: "goocanvas: FTBFS: ld: demo.o: undefined reference to symbol 'sincos@@GLIBC_2.2.5'"
add patch from Ubuntu / Matthias Klose, upload to DELAYED/2
#713635 src:osm-gps-map: "osm-gps-map: FTBFS: osm-gps-map-widget.c:778:21: error: format '%lld' expects argument of type 'long long int', but argument 4 has type 'goffset' [-Werror=format=]"
add patch from upstream git, package FTBFS again later
~~#713666~~ src:gnustep-base: "gnustep-base: FTBFS: GSXML.m:983:38: error: dereferencing pointer to incomplete type"
add patch from Ubuntu / Benjamin Drung, upload to DELAYED/2
~~#713700~~ src:gsnmp: "gsnmp: FTBFS: gsnmp.h:27:18: fatal error: gnet.h: No such file or directory"
apply patch from Ubuntu / Daniel T Chen, upload to DELAYED/2
~~#713701~~ src:libmatchbox: "libmatchbox: FTBFS: ld: dump-image.o: undefined reference to symbol 'XOpenDisplay'"
add patches from Ubuntu / Daniel T Chen, upload to DELAYED/2
~~#713708~~ src:gtkgl2: "gtkgl2: FTBFS: ld: zktor.o: undefined reference to symbol 'atan2@@GLIBC_2.2.5'"
apply patch from Ubuntu / Daniel T Chen, upload to DELAYED/2
~~#713727~~ src:xmlroff: "xmlroff: FTBFS: ld: ../libfo/.libs/libfo-0.6.a(fo-datatype.o): undefined reference to symbol 'floor@@GLIBC_2.2.5'"
apply patch from Ubuntu / Daniel T Chen, upload to DELAYED/2
~~#713728~~ src:matchbox-panel: "matchbox-panel: FTBFS: ld: mb-applet-menu-launcher.o: undefined reference to symbol 'XInternAtom'"
update patch to link against more libs, upload to DELAYED/2
~~#724080~~ src:libgraph-perl: "libgraph-perl: FTBFS: make: *** [clean] Error 1"
fix clean target in debian/rules, upload to DELAYED/2, then moved to 0-day with maintainer's consent
#724085 src:myspell: "myspell: FTBFS: POD errors"
prepare a patch
~~#724115~~ src:hunspell: "hunspell: FTBFS: POD error"
prepare a patch
#724142 src:xmltv: "xmltv: FTBFS: Tests failed"
investigate error
~~#724750~~ electrum: "electrum fails to launch from terminal"
add missing dependency, upload to DELAYED/2
#726801 biomaj: "biomaj: fails to install with Recommends enabled"
tag unreproducible, later further analyzed as a java dependency problem by follow DD

Next.

Previous.